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Abstract 

Current tests for contagion in so- 
cial network studies are vulnerable to 
the confounding effects of latent ho- 
mophily (i.e., ties form preferentially 
between individuals with similar hid- 
den traits). We demonstrate a gen- 
eral method to bound the strength 
of causal effects in observational so- 
cial network studies, even in the pres- 
ence of arbitrary, unobserved individ- 
ual traits. Our tests require no para- 
metric assumptions and each test is as- 
sociated with an algebraic proof. We 
demonstrate the effectiveness of our 
approach by correctly deducing the 
causal effects for examples previously 
shown to expose defects in existing 
methodology. Finally, we discuss pre- 
liminary results on data taken from 
the Framingham Heart Study. 

Christakis and Fowler's paper suggesting that 
obesity may spread along social ties [3] has 
sparked years of discussion about what con- 
stitutes evidence of contagion in observational 
social network studies (see, e.g., this recent 
review 2 ). The most general result from 
the causal modeling perspective shows that la- 
tent homophily acts as a confoundcr for conta- 



gion [14] . Therefore, a unique identification of 
the strength of contagion is impossible without 
additional parametric assumptions. However, if 
the true goal is to test for the presence of conta- 
gion, a lower bound on the strength of contagion 
is all that is necessary. We present a general 
method to obtain such bounds in this paper. 

Identifying causal effects in social networks 
through intervention is often impractical or 
even unethical. Measuring all the human traits 
that affect link formation and observed ac- 
tions is unrealistic. A method to measure the 
strength of causal effects without recourse to 
these alternatives is of central importance for 
studying social networks. Our method produces 
a sequence of bounds on the strength of conta- 
gion which converge in some limit to the best 
possible bounds. In this sense, our method is 
the best solution to the problem of measuring 
the strength of contagion that does not involve 
invoking additional (parametric) assumptions. 

1 Model and Method 

We have two actors Alice(A) and Bob(B) whose 
actions or attributes we observe at discrete time 
steps, t = 1, ...,T, In Fig. [T] we depict a 
Bayesian network that incorporates both con- 
tagion (also known as social influence) and ho- 
mophily, following Shalizi and Thomas [T4] . 
The only difference is that we are more pes- 



simistic in that we will consider all the at- 
tributes of Alice(iiU) and Bod(-Rb) to be hid- 
den. In this figure, we condition on E, the pres- 
ence of a directed edge from Alice to Bob. The 
formation of such an edge depends in some ar- 
bitrary way on the hidden attributes of Alice 
and Bob. We often refer to this process as la- 
tent homophily, even though the edge formation 
process is unrestricted and could, e.g., be het- 
crophilous instead (i.e. edges are more likely to 
form between actors with different attributes). 

We observe some sequence of actions 
(Ai,. . . ,At) (sometimes abbreviated A\-t 
or simply A) and Bit- Given E, what 
correlations are possible between A and Bl 
Below we use standard results about graphical 
models [T^] for the network in Fig. [I] along 
with some simple manipulations using Bayes' 
rule. We also employ the common shorthand 
that capital letters represent random variables 
and we suppress their instantiation when no 
ambiguity arises, i.e. P{A\) = P(A\ = ai). 

P(A 1:T ,B 1:T \E)= (1.1) 

5~J P(R A ,R B \E)P(A 1 \R A )P(B 1 \R B )x 
Ra,Rb 
T 

H P(A t \A t _ 1 ,R A )P(B t \B t _ 1 ,A t _ 1 ,R B ) 

t=2 

We additionally require that the transitions 
are stationary, i.e., Vi, t' , P(A t \A t -i, Ra) — 
P{A t i \A t i-\, Ra), and similarly for B. In prin- 
ciple, we could allow the actions for Alice and 
Bob to come from any finite discrete set, but for 
simplicity we will consider A t , B t £ {0, 1} from 
here on. Surprisingly, we can allow the size of 
the hidden attribute space to be infinite. 

The Bayesian network in Fig. [I] represents a 
class of models that can be specified by set- 
ting the conditional probability distributions 
for each node, P(X\parents(X)). In this case, 
unidentifiability means that a particular (non- 
experimental) probability distribution over ob- 
served variables does not pick out a unique 
model. Therefore, the strength of the depen- 




Figure 1: A slice of a Bayesian network rep- 
resenting both latent homophily and contagion 
(dotted line) . We observe a sequence of actions 
for A and B that depend on previous actions 
and on hidden attributes R a ,Rb- This graph 
is conditioned on the presence of a directed edge 
in the social network, E, from A to B, whose 
formation depends on Ra, Rb- 

dence of B t on A t -\ is also not uniquely de- 
termined. In technical terms the presence of a 
back-door path from A t -\ to B t is a confounder 
preventing a unique identification of the causal 
effect of on B t (see Q31 [H]). Our goal 

is not a unique identification of the strength of 
the causal effect, but rather to establish a lower 
bound on the strength. We begin by slicing up 
the space of all possible models according to the 
strength of the causal effect. 

Definition 1.1. We define A-causal models for 
A = [(5z,<5 u ] as the set Va, consisting of proba- 
bility distributions, P(A, B\E), s.t. there exists 
conditional probability distributions that satisfy 
Eq. [777] and that additionally satisfy: 

6i<P(B t = l\B t . 1 ,A t . 1 = l,R B ) (1.2) 
-P(B t = llB^At-! = 0,R B ) < S u , 

for all possible values of B t _i, R B . 

The class of models specified by Si — S u = we 
denote by Vq and refer to as non-causal models. 



The quantity in Eq. \1.2\ is conventionally re- 
ferred to as the average causal effect of treat- 
ment, or just average treatment effect, where 



the treatment in this case refers to Alice's ac- 
tion and the effect is measured on Bob [TU] . 
We are really bounding the average treat- 
ment effect for every sub-population defined 
by Rb- Identifying whether a distribution is 
in the set of non-causal models is of special 
interest and because Si = 5 U = 0, this im- 
plies that P(Bt\Bt-i,At-i,Rg) simplifies to 
P{B t \B t _ u R B ). 

1.1 Simple Example 

Consider a simple function of the observed vari- 
ables c(Ai :T , Bi :T ), or c(A, B). The expecta- 
tion value of this function is 

(c(A,B)) P = P(AB\E)c(A,B). 

A,Be{0,l} T 

Set T = 4 and consider a specific observable, 

c^(A 1:T ,B 1:T )= (1.3) 
(1{A 2 = B 2 ^A 3 = B 3 } 
- 1{A 2 =B 3 ^A 3 = B 2 }) 
x(l-l{A 1 ? A i ]l{B 1 ^P 4 }) 

This operator can only take values or ±1, so 
its average must lie in this range. Using Def . |1.1| 
and simple but tedious algebra verifies that 

VPeVo, (c^(A,B)) P = 0. (1.4) 

While a fact like this is straightforward to ver- 
ify, it offers little understanding. In the rest of 
the paper, we develop methods to find equali- 
ties (and inequalities) of this form. Moreover, 
we focus the search by looking for useful tests 
so that, e.g., we find conditions that are satis- 
fied VP € Pq, but are violated by models which 
contain contagion. 

For instance, if we define a simple model of in- 
fluence, P 5 (A, B) in which P(A t = 0) = P(A t = 
1) = P(Bi = 0) = P{B 1 = 1) = 1/2 and 
B t = A t -\ with probability 8 otherwise B t ran- 
domly becomes or 1. The "average treatment 



effect" in this case is just S. We can easily see 
that 

(c^(A,B)) Ps = -^5. 

Even for a tiny amount of influence Eq. |1.4| 
is violated, demonstrating that the distribution 
Ps(A, B) cannot be explained by a non-causal 
model, even with an infinite number of hidden 
attributes. 

1.2 Finding Useful Tests 

Determining if a (non-experimental) probability 
distribution is compatible with a class of mod- 
els defined by Def. |1.1| seems hopeless because 
the number of parameters depends on the size 
of the hidden attribute space which can be infi- 
nite. Luckily, as we have just seen, we can find 
simple conditions which all distributions in "Pa 
satisfy. A distribution that violates one of these 
conditions is incompatible with the associated 
class of models. 

We begin by considering a candidate probability 
distribution, P{A, B), a class of models (speci- 
fied by a set of distributions) P, and some ob- 
servable, c(A,B). We are looking for the fol- 
lowing condition to be satisfied. 

(c(A,B)) p -(c(A,B)) P > 1 >0, VPeP 

If this condition is satisfied then c{A, B) consti- 
tutes a statistical test that is bounded VP e P, 
but is violated for the distribution P. Looking 
for an observable c(A, B) with associated bound 
7 leads us to an optimization problem. 

maximize 7, s.t. (1-5) 

~/,c(A,B) 

-~f+(c(A,B)) p -(c(A,B)) P >0, VPeP 

Our goal is to transform this optimization prob- 
lem into a sequence of linear programs (LP), so 
that the lower bound, 7 becomes successively 
tighter as we increase the size of the LP. To 
that end, we will represent the expectation val- 
ues in terms of polynomials. This allows us to 
represent the condition in the second line using 



a result about representations of non-negative 
polynomials which we include here. 

Theorem 1.1. (Handelman's representation 
[7)9 Any polynomial, 7 — h(x), that is posi- 
tive on a compact domain K, = {x : gi(x) > 
0, ...,g s (x) > 0}, where the gi(x) are linear, 
can be written in this form, 



h(x) = A* ft 



fc£N s i=l 

using non-negative X 's, with k representing a 
vector of non-negative integers. 

Because the RHS consists of a sum of products 
of non- negative quantities on K,, we can see the 
LHS should be non-negative. The theorem en- 
sures any positive polynomial can be written 
in this form. The main drawback, however, is 
that it does not say how many terms are re- 
quired. Although this shortcoming is remedied 
in [4], those bounds are often impractical. In- 
stead, we can bound the number of terms so 
that ^2iki < d max , and therefore 7 is an up- 
per bound for h{x) on IC, that becomes progres- 
sively tighter as we increase d max . As a bonus, 
providing a concrete representation in terms of 
A's provides a certificate, or an algebraic proof, 
that 7 is an upper bound for h{x) on JC (see 
Sec. 1.4 for an explicit example). 



Now we can proceed to re-write the optimiza- 



tion problem in Eq. 1.5 as an LP using Handel- 
man's representation. First, looking at Eq. |1.1| 
and using convexity, we see that, for P G Va 

(c{A,B)) PeV& < max (c{A,B)) P , = (1.6) 

max c{A,B)P(A 1 \R A )P(B 1 \R B )x 
P( ''' ) A,se{o,i} r 

T 

[] P(A t \A t - l ,R A )P{B t \At- 1 ,B t - 1 ,R B ) 

t=2 

The minimization is over conditional probabil- 
ity distributions that satisfy normalization, pos- 
itivity, and the condition in Eq. |1.2| We think 
of the conditional probability distributions as 



variables, e.g. x\ = P(A\ = Q\Ra), normal- 
ization is ensured by writing P{A\ = 1\Ra) — 
1 — x%, positivity corresponds to conditions like 
gi(x) = x\ > 0,g2(x) = 1 — X\ > 0,..., and 
Eq. 1 1 . 2 1 corresponds to more complicated lin- 
ear inequalities involving these variables that 
depend on Si,S u . We represent all these lin- 
ear inequalities with the set /Ca- We will give 
a more concrete demonstration of this mapping 
in the next section. 



l.G 



we 



To complete the transformation of Eq 
also consider the vector of variables, c, whose el- 
ements we will sometimes index cab = c(A, B), 
where the concatenated binary sequences A 
and B should be interpreted as an integer in 
[0, 2 2T — 1]. We can do the same to represent 
P(A, B) as a vector p, and then expectation val- 
ues are just dot products. Putting this together, 
we re-write this equation as 



{c(A, B)}p &v& < max c • f(x) 



We can ensure —7 + (c(A,B)}p is an upper 
bound for the RHS (which is the condition writ- 



ten in the second line of Eq. 1.5) by ensuring 
that —7 + c • p — c ■ f(x) has a Handelman rep- 
resentation (and is therefore non- negative). 

That leads us to the following form of the opti- 
mization in Eq. |1.5| 



(1.7) 



maximize 7, s.t. 

7,c, A 



-7 + c-p-c- f(x) = Afc IT 5i(^) A 

fc<=N a i=l 

7, A > 0, Cl e [-1,1],^^ < 



Equating the terms of the polynomials on both 
sides of the second line results in linear equal- 
ities among the variables. W.l.og. we restrict 
the c;'s to some fixed range. This optimiza- 
tion turns out to be a linear program (LP), 
and, hence can be efficiently solved in polyno- 
mial time. The feasibility of this LP proves that 
there is a linear equality that is obeyed by all 



distributions in Va but is violated by the dis- 
tribution P. Namely, we have shown that 

VPg^a, (c(A,B)) P <(c(A,B))p-~f. 

So obviously if P G "Pa this would lead to a 
contradiction (assuming 7 is positive). 

Not only does this LP provide us with a concrete 
bound and the size of the violation by P, 7, the 
A's can be interpreted as an algebraic proof of 
the upper bound. The main factor determining 
the size of the LP is the number of variables, Aj, 
which is determined by the number of terms we 
use in our Handelman representation. Mathe- 
matica can solve LPs with hundreds of thou- 
sands of variables and our code is available pQ. 
In the next section, we provide a more concrete 
formulation of this optimization. 

1.3 Non-Causal Models 

We give a more explicit formulation of Eq. |1.7| 
for the special case of non-causal models. In 
this case, each variable sequence, Ai : t is a mix- 
ture of Markov chains with associated transition 
probabilities that depend on the unknown value 
of Ra- We denote by a+(a_) the probability 
that A flips from 0(1) to 1(0) at some time step 
and «o — P{A\ — 0). We have similar param- 
eters for B : /3+.-.0- We use just a or f3 when 
possible to avoid writing out all three. 

q A (a) = P(A 1:T \R A ) = ^-^{1 - a ) Al (1.8) 

a F 01 (A) a F 10 (A) (1 _ a + ) F 00 (A ){1 _ a _ )Fll (A) 

The same equations hold replacing A with B 
and a with j3. Fij(A) counts the number of 
transitions from state i to j in string A. If A, A' 
have the same initial state and the same transi- 
tion counts(e.g. (0, 0, 1, 0) and (0, 1, 0, 0)), they 
are said to be partially exchangeable because 
they clearly have the same probability of oc- 
curring. This observation alone, especially ex- 
tended to joint strings on A and B, imposes 
serious constraints on possible observed proba- 
bilities and explains the existence of equalities 



like Eq. 1.3 We discuss tests based on this idea 
and their relationship to de Finetti theorems in 
Appendix |Al 



In this case, the bounds imposed by Eq. |1.2| are 
trivial and have already been taken into account 
by eliminating the dependence of B t on A t -\ in 
defining the variable above. This leaves us with 
only 12 inequality constraints to enforce posi- 
tivity, two for each variable: g\ (a, f3) = > 
O,g 2 (a,0) = l-a > 0, . . . , g u (a, j3) = 0+ > 
0, 5l2 (a,/3) = 1 -p+ > 0. Using f AB (a,/3) = 
qA(a)qB(l3) and these definitions for gi(a 1 /3), 
we can plug these into Eq. |1.7| to search for 
bounds that are satisfied by probability distri- 
butions explained by non-causal models, Vq. 

1.4 A Sample Bound 

As a simple example of how we can use LPs 
to give bounds, we begin by bounding P(A — 
(0, 0, 1)) for P e V . We see that 

P(A= (0,0,1)) < max a (l - a + )a+ 

a o ,Q + S[0,l] 



from Eq. |1.8| and Eq. |1.6| We are looking for an 
upper bound 7 so that 

7 — ao(l — a +) a + > on /C, 

with K, — {a , a + : a , a + , (1 — a + ), (1 — ao) > 
0}. Casting the problem as in Eq. |1.7| with 
dmax — 3 (with c a constant in this case), we 
get an LP with 36 variables (counting 7) whose 
solution results in the following representation. 



- - a (l - a + )a + = 

3(1 - «+) 3 + 3«+ + (1 - Qf )(l - a+)a + 

The RHS constitutes an algebraic proof that 
7 = 1/3 is an upper bound for P(A — (0, 0, 1)) 
on this domain. Although in principle we can 
generate algebraic proofs for all bounds pre- 
sented in the paper, they are unwieldy for all 
but the simplest examples. Instead we provide 
code to generate bounds [T]. 



1.5 Equality Constraints 

A special case of Eq. |1.7| occurs when we set 



dmax = 0. Essentially, we are looking for c, 
so that c • f(x) = 0, and we do this by ensur- 
ing that the coefficient of each monomial or- 
der x^x^--- equals zero. Putting the con- 
straints from all these coefficients together leads 
to some matrix M so that Mc = 0. If T=3, the 
null space of M has dimension 0, but if T=4 
and we consider Vq, (so that f(x) is set by 



Eq. 1.8|, the null space of M has a dimension 
of 60. This implies that there are 60 linearly 



independent equalities like the one in Eq. |1.3| 
that are satisfied by each probability distribu- 
tion P(A,B\E) G V . A 28-dimensional sub- 
space of these equalities are satisfied for any 
Va- For T = 4, all of these equalities can be 
understood in terms of partial exchangeability, 
see Appendix [Aj Alternately, we can look for 
particular useful equalities like Eq. |1.3[ or an- 
other considered in Appendix [B] 

In the next section, we focus on inequalities sat- 
isfied by all distributions in Vo, so that violation 
of these inequalities can rule out a non-causal 
model. 

2 Results 

Shalizi and Thomas (ST) illustrate how the 
confounding effects of latent homophily cause 
standard tests for contagion to fail with two 
examples [14 . In the first, they demonstrate 
a non-causal model that looks like contagion. 
In the second, they consider a simple copying 
model and show how the observed results ap- 
pear to be explained by homophily. Our tests 
correctly identify the underlying mechanism in 
both cases. 

2.1 Homophily Looks Like Contagion 

A now popular test for contagion introduced 
in [3] considers unreciprocated, directed edges 
so that A can influence B, but not vice versa. 
Then, if we regress £?'s action based on A's his- 



tory, versus regressing ^4's action based on B's 
history, we should see an asymmetry in the size 
of the regression coefficient if A influences B. 
ST's example shows that this asymmetry can 
be reproduced by latent homophily as long as 
there is an asymmetry in the edge formation 
mechanism. E.g., consider all nodes to take 
some static hidden attribute in the range [0,1]. 
We say that nodes are more likely to form links 
with someone who has a similar attribute (ho- 
mophily), but they also tend to prefer people 
whose attribute is closer to the median, 0.5, 
leading to an asymmetry in preference of edge 
formation. If each node's state at each time step 
only depends on their hidden attribute and their 
previous state, we have a model which clearly 
has no influence. However, ST show that this 
model does reproduce asymmetries in regression 
coefficients which would be interpreted as a sign 
of influence. 

We ran the code that ST provided in their paper 
[T4"] , making only one change so that the state of 
each node at each time step is a binary variable. 
For a given graph, we consider all pairs of nodes, 

A, B so that there is a directed edge from A to 

B. Then we look at the frequency of observing a 
given joint sequence of states for ^4i : 4, Bi-4, and 
we use this to construct the empirical probabil- 
ity distribution, Plh(A, B\E). We estimated 
this distribution based on M = 400, 000 sam- 
ples. As a first test, we can consider the equal- 
ity constraints that should be satisfied for any 
non-causal model, given by Eq. |1.3| Sec. [5] 



3 (D) 

(2)\ 



(+1499 - 1493)/400000 
(+20006- 19871)/400000 



For a non-causal model we expect exactly 0, but 
for an empirical distribution the results are not 
exact. In this case, we can calculate a simple 
confidence bound. Because c 1 - 2 ' take only 
the values 0, ±1, and we are trying to determine 
if the mean value is nonzero, we can use the 
binomial distribution to give a probability of 
getting an excess of +1 (heads) over —1 (tails) 
for a fair coin. The p- values we get are 0.54 and 



0.25, respectively, which is not extreme enough 
to rule out the null hypothesis that Plh £ Vo- 

The previous test is only one of many condi- 
tions we expect non-causal models to satisfy. A 
more comprehensive test is to take the empiri 



t=l 



cal distribution, Plh an d plug it in to Eq. 1.7 
(we used d max — 9). The result is an ob- 
servable clh(A,B), jlh = 0.0024, so that 
VP € V ,{c lh ) Plh - (c LH ) P > 0.0024. How 
can we interpret this result? Because we have 
optimized our test based on the data, we can- 
not apply a straightforward confidence bound. 
For a random vector in 2 2T dimensional space, 
we want to estimate the difference between the 
sample mean (empirical distribution) and true 
mean. Based on the central limit theorem, we 
expect the average Euclidean distance between 
the two to just be 1/y/M. In Fig.JSJ we see that 
the lower bound on the Euclidean distance be- 
tween V a and P LH , which is j/\c LH \ = 0.00022 
is much less than we would expect from error in 
sampling the empirical distribution, ss 0.0016, 
so once again we fail to reject the null hypoth- 
esis that Plh & Po- 
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Figure 2: Average Euclidean distance between 
the vector of empirical frequencies estimated 
with M samples and the vector of true prob- 
abilities from which the samples were drawn. 
Compared is the Euclidean deviation from V 
by two empirical distributions. 



2.2 Contagion Looks Like Homophily 

In this case, we construct a simple network (see 
Fig. [3| , where each node is defined by a static 



t=4 




Figure 3: An example of simple copying dy- 
namics on a network in which color (red/green) 
spuriously appears to evolve according to the 
static node type (circle/square). 



trait, square or circle. Links are more likely be- 
tween nodes of the same type. We start each 
node randomly in the state red or green. We 
then evolve the state of the graph by repeat- 
edly picking an edge and then copying the state 
of one node to its neighbor. After many iter- 
ations, we observe the new state of the graph. 
ST point out that by looking at the dynamics 
it can appear that a tendency to become green, 
e.g., is explained by the static attribute of be- 
ing a square, while circles tend to red. However, 
this is just transient behavior caused by the ho- 
mophilous network structure, the node type has 
nothing to do with the copying mechanism. 

Again, we generate M = 400, 000 samples using 
this model (details and code in [2]), generating 
an empirical distribution, P copy . Our test easily 
identifies contagion in this case. For instance, 
(c^) p = 0.062, for which the p-value un- 
der the null hypothesis is ~ 10 ~ 3000 . If, on the 
other hand, we solve Eq. 



1.7 



using P cop y, we 
get jcopy/\c CO py \ = 0.0095, which is much larger 
than we would expect from error in sampling 
the empirical distribution (0.0016, see Fig. |2]). 



2.3 Experimental Results 

We have done a preliminary analysis of the 
Framingham Heart Study (FHS) data, a lon- 
gitudinal social network study which includes 
many covariates (e.g. obesity, marriage, depres- 
sion, smoking, alcohol) and link types (friend, 
neighbor, co-worker, sibling, spouse) [3J [2J. 
Methodologically, we proceed in an identical 
fashion to the previous sections. For instance, 
using the test in Eq. |1.3| for statistics about ob- 
served changes in Body Mass Index (BMI) of 
(non-related) friends, we can rule out a non- 
causal model with 99% confidence. Since even 
this claim is controversial, we have included 
the relevant statistics for observing various se- 
quences in Appendix [C] so that the violation 
of Eq. |1.3| can be verified directly. A complete 
analysis of FHS data will appear in future work. 

3 Related work 

Christakis and Fowler were not the first to look 
for contagion in observational social network 
studies, but their study on obesity [3J marks the 
beginning of an eruption of methodological in- 
trospection and critique which CF have recently 
summarized and addressed [2J. While most of 
the responses to that work center around the 
robustness of various parametric modeling as- 
sumptions (e.g., sensitivity analysis [H]), our 
central concern is with the broader, so far unan- 
swered, critique leveled by ST that latent ho- 
mophily poses a significant barrier to identify- 
ing contagion [14]. 

The main difference to ST's paper is that in- 
stead of identifying the strength of causal ef- 
fects, we put lower bounds on these effects 
(which would seem to be the primary aim for 
most practitioners). In this sense, we are actu- 
ally in the realm of "partial identifiability" [S], 
an approach ST themselves suggest as an open 
problem in the final section of their paper [14) . 

The approach we have taken here considers 
probability distributions as points in some high- 



dimensional vector space. In this context we 
can apply tools from algebraic geometry to an- 
swer a variety of questions about set member- 
ship. While the possibility of this approach was 
recognized long ago [8] , those approaches re- 
lied on computationally infeasible, exact meth- 
ods. Recent advances in convex relaxations for 
algebraic geometry problems [llj make this ap- 
proach feasible. This perspective was consid- 
ered in a general context [IB], but the result 
here differs in several ways. Most importantly, 
casting the optimization problem here as an LP 
instead of a semi-definite program allows us to 
address more complex problems. Ultimately, 
this added power allows us to solve the specific, 
open problem of bounding the strength of con- 
tagion in a non-parametric way. 

4 Discussion 

When it comes to human behavior, measuring 
and controlling for every variable that might be 
relevant is unrealistic. For that reason, it is 
important to obtain the best possible bounds 
on causal strength with the fewest possible as- 
sumptions. In the context of a specific graphical 
model like Fig. [T] we can unambiguously state 
our result as a lower bound on the strength of 
contagion. In the real world, contagion is only 
one of many effects that could cause deviation 
from the non-causal model. For example, com- 
mon external causation could be a factor. Even 
in that situation our test can provide valuable 
insight. For instance, we can test whether a 
distribution belongs to the set of models which 
include latent homophily and unbounded con- 
tagion strength. If not, we can conclude that 
other factors must be involved (we found this 
to be the case for smoking, for example). 

The methods outlined in this paper provide a 
powerful way to statistically test for causal ef- 
fects, even in the presence of confounding vari- 
ables, while invoking minimal assumptions. Fu- 
ture work will apply these tests to an in-depth 
analysis of causal effects in the Framingham 
Heart Study data. 
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A Joint Partial Exchangeability 

Consider the set of pairs z,z' e {0,1} T that 
obey the relation z = e z' . Here = e is an equiv- 
alence relation that denotes partial exchange- 
ability (PE). Two sequences are partially ex- 
changeable if and only if they have the same ini- 
tial state and the same count for each transition 
(which we called Fij(Z) in the text). Clearly, 
a stationary Markov chains produces partially 
exchangeable sequences with equal probability, 
because the probabilities depend only on the 
initial state and number of transition. Even if 
we average over Markov chains with unknown, 
arbitrary transition matrices, this property is 
preserved. A famous result known as the de 
Finetti theorem for Markov chains is that in 
the limit of long sequences, PE (along with a 
small technical requirement that there is always 
a probability of visiting some state) is also a 
sufficient condition for data to be described as 
a mixture of Markov chains [5] . Essentially, we 
have a theorem proving that a symmetry prop- 
erty justifies a particular form for a latent vari- 
able graphical model. 

In the context of the null model, Vq (model 
1 in Fig. [2]), where we ask whether Alice 
and Bob's sequences are both described by 
mixtures of Markov chains this suggests a 
simple test. For each joint sequence A, B 
we should see that P{A,B) = P(A',B') if 
A = e A' and B = e B' . We call this notion 
joint partial exchangeability (JPE), which 
differs from PE, because statistics for Alice 
and Bob's sequences could both be PE while 
failing to be JPE (i.e. P(A) = P{A'),P{B) = 
P{B'),P(A,B) ^P(A',B')). As an example of 



the different possibilities consider the sequences 
x = (0,0,1,0), y = (0,1, 0,0), z = (0,0,0,1). 
Clearly, x = e y. Consider also the al- 
ternative models in Fig. |4j We can view 
model (3), for instance, as a mixture of 
stationary Markov chains on the joint 
variable (A t ,Bt), but this implies dif- 
ferent exchangeability properties. Going 
back to our example, we see the difference. 

Equality Models Satisfying 



equality 

P(A = x,B = x) = P(A = y,B 
P(A = x, B = x) = P(A = x, B — .., 
P(A = z,B = x) = P(A = z.B = y) 



x,B = y) 



(1),(2),(3) 

(1) 
(1),(2) 



For longer sequences of observations, this type 
of test would be computationally much eas- 
ier than solving the optimization suggested in 
Eq. |1.7[ which is exponential in T. De Finetti's 
theorem suggests that in the long T limit, this 
test may even be sufficient. 

This test has another nice interpretation in 
terms of a symmetry condition. If there is influ- 
ence from A to B some (but not all) of the sym- 
metries are eliminated (see Sec. |1.5| . If there 
is influence in both directions, an even smaller 
(but nonzero) set of symmetries remain. Essen- 
tially, model checking then becomes a question 
of (approximately) matching symmetries in the 
data to the appropriate models. Interestingly, 
these symmetries differ from typical conditional 
independence relations. Because we have a mix- 
ture of Markov processes, none of the observed 
variables are independent. A similar perspec- 
tive has been applied for testing the order of a 
Markov process [T3"] . 

B Another Equality 

We can look for equalities that are useful for 
identifyi ng certain types of influence by solv- 
ing Eq. 



1.7 



using some influence model for P 
and setting d max — 0. We already saw an ex- 
ample in Eq. |1.3| which was obtained by set- 
ting T = 4 and defining P^\A,B) so that 
P(A t = 0) = P(A t = 1) = 1/2 and B t = A t 
with probability S otherwise B t randomly be- 
comes or 1. This can be thought of as an "in- 




Figure 4: Three model variations with different 
exchangeability properties. 

stant" influence model, where Alice's choice at 
time t influences Bob's choice at time t. In con- 
trast, we had previously defined a delayed in- 
fluence model, Pg(A, B) where B t = Af-i with 
probability S. Solving Eq. |1.7| using using Pg 
leads to another equality constraint, VP G Vq, 
(c (2) ) P = 0, listed in Table [lj For models with 
influence, on the other hand, we have the fol- 
lowing. 

The value of c^ 2 \A, B), for a particular A € 
{0,1} 4 ,5 € {0, l} 4 can be read from the fol- 
lowing table. Simply consider the sequence A 
or B as a binary number and pick the A-th row 
and the i?-th column. This equality is the max- 
imally violated one for a particular model of in- 
fluence. 

C FHS Example 

We consider a particular example from the 
Framingham Heart Study regarding BMI (a 
BMI greater than 30 is defined as "obese" ) . For 
details about the dataset, see [3J|S]. We consid- 
ered waves 4,5,6, and 7 of the original and off- 
spring cohort, and considered pairs A,B where 
B nominated A as a friend, and A and B are 
not related. Because our goal is to rule out the 



non-causal model, the timing of the edge cre- 
ation is not a factor. We defined the binary 
variable A t = 0(1) to indicate that A's BMI 
did not (did) increase by more than the median 
amount since the last survey. This definition 
was intended to reduce the effect of dynamic 
factors influencing all actors in the same way. 
The median change in BMI for these 4 waves 
was (0.55,0.57,0.42,0.20). As we mention in 
the conclusion, we can also test whether the 
data is consistent with a model including latent 
homophily and unbounded contagion as a way 
of identifying the importance of other unmod- 
eled causal effects. We were not able to rule out 
that Pbmi € Va , but could rule out Pbmi € Vq 
with high confidence. 

In Table[2j we list the number of times each joint 
sequence of actions for {A\. Al Bi-a) was observed 
over four time periods. We reference counts for 
a given sequence using the same conventions as 
in the previous table. Consider the sequence A 
or B as a binary number and pick the A-th row 
and the B-th column. We give raw counts (in- 
stead of a normalized probability) so that con- 
fidence analysis can be done. This example is 
discussed in Sec. 12.31 
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Table 1: Specifying an observable for which 
VP g Vo, (c^ (A ^) = 0. 
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Table 2: Counts for observing the joint sequence 
(A, B) in the FHS data. 
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